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Abstract 


Let F(z) be a vector- valued function F : C C N , which is analytic at z = 0 and meromorphic 
in a neighborhood of z = 0, and let its Maclaurin series be given. In a recent work [Si6] by 
the author, vector-valued rational approximation procedures for F(z) that are based on its 
Maclaurin series, were developed, and some of their convergence properties were analyzed in 
detail. In particular, a Koenig type theorem concerning their poles and a de Montessus type 
theorem concerning their uniform convergence in the complex plane were given. With the help of 
these theorems it was shown how optimal approximations to the poles of F(z) and the principal 
parts of the corresponding Laurent series expansions can be obtained. In the present work we 
use these rational approximation procedures in conjunction with power iterations to develop 
bona fide generalizations of the power method for an arbitrary N x N matrix that may be 
di agon aliz able or not. These generalizations can be used to obtain simultaneously several of the 
largest distinct eigenvalues and corresponding eigenvectors and other vectors in the invariant 
subspaces. We provide interesting constructions for both nondefective and defective eigenvalues 
and the corresponding invariant subspaces, and present a detailed convergence theory for them. 
This is made possible by the observation that vectors obtained by power iterations with a matrix 
are actually coefficients of the Maclaurin series of a vector-valued rational function, whose poles 
are the reciprocals of some or all of the nonzero eigenvalues of the matrix being considered, 
while the principal parts of the Laurent expansions of this rational function are vectors in the 
corresponding invariant subspaces. In addition, it is shown that the generalized power methods 
of this work are equivalent to some Krylov subspace methods, among them the methods of 
Arnoldi and Lanczos. Thus, the theory of the present work provides a set of completely new 
results and constructions for these Krylov subspace methods. This theory suggests at the same 
time a new mode of usage for these Krylov subspace methods that has been observed to possess 
computational advantages over their common mode of usage. 
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1 Introduction 


Let F(z) be a vector- valued function, F : C — + which is analytic at z = 0 and meromorphic 
in a neighbourhood of z = 0, and let its Maclaurin series be given as 

oo 

F(Z) = £ Um* m , (1.1) 

m ==0 


where « m are fixed vectors in C N . 


In a recent work by the author [Si6] three types of vector-valued rational approximation proce- 
dures that are entirely based on the expansion in (1.1) were proposed. For each of these procedures 
the rational approximations have two indices, n and k, attached to them, and thus form a two- 
dimensional table akin to the Pad6 table or the Walsh array. Let us denote the (n, it) entry of this 
table by Then /’„,*(■?)) if it exists, is defined to be of the form 


J? ,(r\ — ^J=° C J ’ ^n+v+ ii*) _ Pn,k(z) (n.Jt) 

EjU cWzk-i ~ QnM ^ ~ QnA0) ~ ^ 

where v is an arbitrary but otherwise fixed integer > - 1 , and 

m 

fm(^) = ^2 m - 0, 1,2,...; F m (z ) = 0 for m < 0, 

1=0 

and the c^ n,k ^ are scalars that depend on the approximation procedure being used. 


( 1 . 2 ) 


(1.3) 


If we denote the three approximation procedures by SMPE, SMMPE, and STEA, then the 
c j * = c jy f° r each of the three procedures, are defined such that they satisfy a linear system of 

equations of the form 

^ v U,'j Cj — — ^iky 0<i<k lj Cfc “ 1, (1.4) 

0 

where are scalars defined as 


Uij - < 


( u n+i > ) 

(ft+1 y u n+j) 


for SMPE, 
for SMMPE, 
for STEA. 


(1.5) 


Here (• , •) is an inner product - not necessarily the standard Euclidean inner product - whose 
homogeneity property is such that (ax,(3y) = a0(x,y) for x y yeC N and a,/? € C. The vectors 
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Qii 72 1 •••) form a linearly independent set, and the vector q is nonzero. Obviously, F n ^(z) exists if 
the linear system in (1.4) has a solution for c 0 ,cj,...,c*_i. 


It is easy to verify that for SMPE the equations in (1.4) involving co,ci, ..., c/t — i are the normal 
equations for the least squares problem 


min 

CO,Ct,—,Ck-l 


k-i 

C 3 U "+i ■f’ u n+k , 

3 = 0 


where the norm || • || is that induced by the inner product (• , •), namely, ||i|| = <J(x,x). 


( 1 . 6 ) 


As is clear from (1.2) and (1.3), the numerator of F n ,k(z) is a vector-valued polynomial of degree 
at most n 4- u + k, whereas its denominator is a scalar polynomial of degree at most k. 


As can be seen from (1.4) and (1.5), the denominator polynomial Q n> k(z) is constructed from 
«n,«n+i 1 -iVt f° r SMPE and SMMPE, and from u n >«n+i,—,«n+2Jfc-i for STEA. Once the 
denominators have been determined, the numerators involve tt 0 , «i, ..., u n+u+ k for all three approx- 
imation procedures. 


The approximation procedures above are very closely related to some vector extrapolation meth- 
ods. In fact, as is stated in Theorem 2.3 in Section 2 of [Si6], F n ,k{z) for SMPE, SMMPE, and 
STEA are obtained by applying some generalized versions of the minimal polynomial extrapola- 
tion (MPE), the modified minimal polynomial extrapolation (MMPE), and the topological epsilon 
algorithm (TEA), respectively, to the vector sequence F m (z),m = 0, 1,2,... . For early references 
pertaining to these methods and their description, see the survey paper of Smith, Ford, and Sidi 
[SmFSi], and for recent developments pertaining to their convergence, stability, implementation, 
and other additional properties, see the papers by Sidi [Sil, Si2, Si5], Sidi and Bridger [SiB], Sidi, 
Ford, and Smith [SiFSm], and Ford and Sidi [FSi]. The above mentioned generalizations of vector 
extrapolation methods are given in [SiB, eqs.(1.16) and (1.17).]. 

A detailed convergence analysis for the approximations F ni k(z) as n -+ oo was given in [Si6], 
whose main results can be verbally summarized as follows: (1) Under certain conditions the de- 
nominators Q n ,k(z) converge, and their zeros, k in number, tend to the k poles of F(z) that are 
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closest to the origin. This is a Koenig type result and is proved in Theorems 4.1 and 4.5 of [Si6], 
where the precise rates of convergence are also given for both simple and multiple poles of F(z ), 
and optimal approximations to multiple poles are constructed in a simple way. (2) Under the same 
conditions F n ^(z) converges to F{z) uniformly in any compact subset of the circle containing the 
above mentioned k poles of F(z) with these poles excluded. This is a de Montessus type theorem 
and is proved in Theorem 4.2 of [Si6]. (3) The principal parts of the Laurent expansions of F(z) 
about its poles, simple or multiple, can be constructed from -F n ,*(*) only. This construction, along 
with its convergence theory, is provided in Theorem 4.3 of [Si6]. 

It turns out that the denominator polynomials Q n ,k( z ) are ver y closely related to some recent 
extensions of the power method for the matrix eigenvalue problem, see [SiB, Section 6] and [Si3] . 
Specifically, if the vectors u m of (1.1) are obtained from u m = Au m _ 1 , m = 1,2, ..., with uq arbitrary, 
and A being a complex N xN and, in general, nondiagonalizable matrix, then the reciprocals of the 
zeros of the polynomial Q nt k(z) a-re approximations to the k largest distinct and, in general, defec- 
tive eigenvalues of A, counted according to their multiplicities, under certain conditions. In Section 
3 of the present work we provide precise error bounds for these approximations for n — ► oo that are 
based on the results of Theorems 4.1 and 4.5 of [Si6]. While the approximations to nondefective 
eigenvalues have optimal accuracy in some sense, those that correspond to defective eigenvalues do 
not. In this paper we also show how approximations of optimal accuracy to defective eigenvalues can 
be constructed solely from Q n ,k(z), providing their convergence theory for n — ► oo at the same time. 
We then extend the treatment of [SiB] and [Si3j to cover the corresponding invariant subspaces in 
general, and the corresponding eigenvectors in particular. For example, we actually show how the 
eigenvectors corresponding to the largest distinct eigenvalues, whether these are defective or not, 
can be approximated solely in terms of the vectors Uj, and provide precise rates of convergence for 
them. The key to these results is the observation that the vector- valued power series u mZ m 

actually represents a vector- valued rational function F(z) whose poles are the reciprocals of some 
or all of the nonzero eigenvalues of A, depending on the spectral decomposition of uo, and that 
corresponding eigenvectors (and certain combinations of eigenvectors and principal vectors) are 
related to corresponding principal parts of the Laurent expansions of the function F(z ). The main 
results of Section 3 pertaining to eigenvalues are given in Theorem 3.1, while those pertaining to 
eigenvectors and invariant subspaces are given in Theorem 3.2 and the subsequent paragraphs. A 
detailed description of the properties of the power iterations u m = Au m _i, m = 1,2, ..., is provided 
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in Section 2. 


In Section 4 we present a short review of genera] projection methods and Krylov subspace 
methods for the matrix eigenvalue problem. Of particular interest to us are the methods of Arnoldi 
[A] and Lanczos [L], which are described in some detail in this section. 

In Section 5 we show that the extensions of the power method developed and analyzed in Section 
3 are very closely related to Krylov subspace methods. In particular, we show that the reciprocals 
of the poles and the corresponding residues of the rational approximations F ni k(z) obtained from 
the SMPE, SMMPE, and STEA procedures are the Ritz values and the Ritz vectors, respectively, 
of certain Krylov subspace methods for the matrix A starting with the power iteration u n . Specifi- 
cally, the methods of Arnoldi and Lanczos are related to the F n k(z) obtained from the SMPE and 
STEA procedures, respectively, precisely in this sense. The main results of Section 5 are summa- 
rized in Theorem 5.4 and Corollary 5.5. 

Now the Ritz values and Ritz vectors obtained from Krylov subspace methods are normally 
used as approximations to nondefective eigenpairs. They are not very effective for defective eigen- 
pairs. Since we know that the generalized power methods based on the SMPE, SMMPE, and 
STEA procedures are related to Krylov subspace methods, the constructions for approximating 
defective eigenvalues and their corresponding invariant subspaces that originate from generalized 
power methods and that are given in Section 3 are entirely new as far as Krylov subspace methods 
are concerned. Similarly, all of the convergence results of Section 3, whether they pertain to defec- 
tive or nondefective eigenvalues and their corresponding invariant subspaces, are new and totally 
different from the known analyses provided by Kaniel [K], Paige[Pai], and Saad[Sal,Sa2]. Some of 
these analyses can also be found in Parlett [Par2] and Golub and Van Loan[GV]. The last two refer- 
ences also give a very thorough treatment of the computational aspects of Krylov subspace methods. 

In Section 6 we show how the Ritz values and Ritz vectors obtained in a stable way from the 
common implementations of the Aroldi and Lanczos methods can be used in constructing the ap- 
proximations to the defective eigenvalues and their corresponding invariant subspaces in general 
and eigenvectors in particular. 
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In view of the connection between the Krylov subspace methods and vector-valued rational 
approximations of [Si6] and the corresponding generalized power methods of the present work, we 
can summarize the main contributions of this paper as follows: 

(i) It is shown that Krylov subspace methods for the matrix eigenvalue problem have a basis 
in analytic function theory and in rational approximations in the complex plane. 

(ii) A mode of usage of Krylov subspace methods akin to the power method, in which one 
first iterates on an arbitrary initial vector many times and only then applies Krylov subspace 
methods, is proposed. This mode produces approximations only to the largest eigenvalues 
and their corresponding invariant subspa ces . 

(iii) The output from Krylov subspace methods, namely, the Ritz values and Ritz vectors, are 
used in constructing optimal approximations to defective eigenvalues and the corresponding 
eigenvectors and invariant subspaces. (The Ritz values and Ritz vectors by themselves are 
not good approximations to defective eigenvalues and corresponding subspaces.) 

(iv) A complete convergence theory for the generalized power methods is provided. 

(v) Numerical experience shows that in many cases the mode of usage proposed in this work 
and mentioned in (ii) above may produce the accuracy that is achieved by applying the 
Arnoldi method in the commonly known way using less storage and less computational work 
when the matrix being treated is large and sparse. 

Before closing this section we would like to note that the eigenvalue problem for defective ma- 
trices has received some attention in the literature. The problem of approximating the largest 
eigenvalue of a matrix when this eigenvalue is defective has been considered by Ostrowski[0], who 
proposes an extension of the Rayleigh quotient and inverse iteration and gives a thorough analysis 
for this extension. Parlett and Poole [ParPo] consider the properties of a wide range of projection 
methods within the framework of defective matrices. The convergence of the QR method for defec- 
tive Hessenberg matrices has been analyzed in detail by Parlett [Pari]. The problem of determining 
the Jordan canonical form of nondefective matrices has been treated in Golub and Wilkinson [GW]. 
The use of power iterations in approximating defective eigenvalues is also treated to some extent 
in Wilkinson [W, Chap. 7] and Householder [H, Chap. 7]. 
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Finally, we mention that the results of [Si6] as well as the application of vector- valued rational 
approximations to the matrix eigenvalue problem were motivated by the developments in a recent 
work by the author [Si4] on the classical Pad4 approximants. 


2 Properties of Power Iterations 

Let A be an AT x N matrix, which, in general, is complex and nondiagonalizable. Let u 0 be a 
given arbitrary vector in C N , and generate the vectors «i,u 2 , according to 


«j+i = Auj, j > 0. 


( 2 . 1 ) 


Denote by s be index of A, i.e., the size of the largest Jordan block of A with zero eigenvalue. Then 
u m is of the form 


M 

«m = J2 


Pj ( 

,=0 v 


m 

/ 


A™, for m> s, 


( 2 . 2 ) 


where \j are some or all of the distinct nonzero eigenvalues of A, which we choose to order such 
that 

|Ai|>|A 2 |>|A 3 |>--->|A m |>0, (2.3) 


Pj + 1 — Wj are positive integers less than or equal to the dimension of the invariant subspace 
of A belonging to the eigenvalue Aj, and dji , 0 < / < are linearly independent vectors in this 
invariant subspace. It turns out that the vector aj p , . is an eigenvector of A corresponding to A ; , 
while the vectors fij,-, i = 0, 1, ...,Pj — 1, are combinations of eigenvectors and principal vectors of 
A corresponding to the eigenvalue A What is more, the subspaces 


Yi — span {dji,i < / K pj}, t — 0, 

are invariant subspaces of A corresponding to the eigenvalue Aj, and satisfy Y 0 D Yi D ... D Y Pj . 


Whether all distinct nonzero eigenvalues are present among Ai,A 2 , ...,Aa/, the exact values of 
the Uj , and the precise composition of the vectors aji , all depend on the spectral decomposition 
of the initial vector Uq. For a detailed derivation of the above see [SiB, Section 2]. 


Before we go on, we will only mention how to determine the maximum value that uj can assume. 
Suppose that the Jordan canonical form of A has several Jordan blocks whose eigenvalues are all 
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equal to A j. Then the largest value that u>j can assume is the size of the largest of these blocks. 
In general, for a randomly chosen vector uq , Uj will take on its maximum value. In cases uj is 
theoretically less than this maximum value, rounding errors on a computer will ultimately force u>j 
to take on its maximum value. 


It is obvious from the above that 

M M 

ko = J>; + 1) = < N 

3=1 1 

and 


(2.4) 


®>«» 0 ^ * < Pj, 1 < 3 < M, are linearly independent. (2.5) 

Also the minimal polynomial of the matrix A with respect to the vector u a has degree Jb 0 = i u j, 
i.e., 

k 0 = min jfc : u a = 0, = 1 

If defined as a monic polynomial, this polynomial is unique and divides the minimal polynomial of 
A, which, in turn, divides the characteristic polynomial of A. Furthermore, the minimal polynomial 
of A with respect to u a is also the minimal polynomial of A with respect to u m for all m > s. Con- 
sequently, any set of vectors {u m , u m +i > Um+Jt} is linearly independent for m > s provided k < k 0 . 



Applying now Lemma 3.1 of [Si6] in conjunction with (2.2), we conclude that the vector-valued 
power series )Cm=o u mZ m represents the vector- valued rational function 


M Pj 

no = E E 




+ G ( Z )> 


U to (1 - V) i+1 

in which the vectors aj, are uniquely determined in terms of the dji from 


0 < / < Pi , 1 < j < M, 


( 2 . 6 ) 



(2.7) 


and hence form a linearly independent set, and G(z) is a vector-valued polynomial of degree at 
most s 1. In fact, G(z ) is in the invariant subspace of A corresponding to the zero eigenvalue. 
Also, aj Pj = dj Pj , i.e., aj Pj is an eigenvector of A corresponding to the eigenvalue A j, while for 
each i, 0 < t < pj - 1, aji is some other vector in the invariant subspace Y t corresponding to the 
eigenvalue A j, and involves principal vectors as well as eigenvectors. 
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When the matrix A is diagonalizable, pj = 0 for all j in (2.2) and hence in (2.6). If, in addition, 
A is normal, then its eigenvectors form an orthogonal set with respect to the standard Euclidean in- 
ner product, namely, (x,y) = x*y, where x* stands for the hermitian conjugate of x. Consequently, 
the vectors a j0 = a j0 in (2.2) and (2.6) are orthogonal with respect to this inner product when A 
is normal. 


Now that we have shown that the power series £m=o represents a rational function F(z) 

that is analytic at z = 0 and has poles Zj = A" 1 of respective multiplicities — Pj + 1, j = 
l,2,...,Af, we can apply any one of the approximation procedures SMPE, SMMPE, or STEA to 
the power series « m z m to obtain the vector- valued rational approximations F„ t k(z) to F(z). 

We can then apply the theorems of Sections 4 and 5 of [Si6] to construct approximations to the 
eigenvalues A j and the vectors aji in (2.6) and (2.7). 


It is important to note that the linear independence of the vectors aji is an important condi- 
tion for the convergence of the SMPE and SMMPE procedures, but is not needed for the STEA 
procedure. In addition, we assume throughout that 


(?i» a io) ■■■ (*7i» a ipi) • (9i* fl to) 

(9fc,aio) ••• (?*,a lpi ) ... ( qk,a t o ) 


(?i» a tp t ) 


/ 0 for SMMPE, 


(9*> a ip«) 


and that 

t 

II(9> a ip>) # 0 for STEA. 

j=i 

No additional assumption is needed for SMPE. 


( 2 . 8 ) 


(2.9) 


In order for (2.8) to hold it is necessary (but not sufficient) that the two sets of vectors 
{aji : 0 < i < pj, 1 < j < t) and qk}, each be linearly independent, as has already been as- 

sumed. 


3 Theoretical Development of Generalized Power Methods 

In light of the developments of the previous section and Theorems 4.1, 4.3, and 4.5 of [Si6] and 
the developments of Section 5 in the same paper, we approach the matrix eigenvalue problem as 
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follows: 


Given the vector uq that is picked arbitrarily, we generate the vectors U\,U 2 , ..., according to 
(2.1). We then fix the integers n and k , and determine the coefficients Cj n ' k \ j = 0, 1, ... y k, of the 
denominator polynomial of F n ^{z) for one of the procedures SMPE, SMMPE, and STEA, by using 
«n,«n+i ,-.,«n+Jk for SMPE and SMMPE, and u n ,u n+1 ,... ,«n+2*-i, for STEA. By Theorem 4.1 
of [Si6] the zeros of the polynomial $ ni jt(A) = X~ k Q ni k(X~ l ) = Y)j=o are approximations 

to the k largest Xj in (2.2), counted according to their multiplicities u>j , provided the conditions 
stated in this theorem are satisfied. In case the matrix A is normal, the zeros of the polynomial 
0n,jt(A), obtained from SMPE and STEA with the standard Euclidean inner product, are even 
better approximations to the eigenvalues A j of A as follows from Theorem 4.5 of [Si6]. 

3.1 Treatment of Eigenvalue Approximations 

Theorem 3.1 below, which is of constructive nature, summarizes all the relevant results con- 
cerning the approximations to the Xj. The corresponding approximations to eigenvectors and other 
vectors in the invariant subspaces are subsequently obtained with the help of the developments in 
Section 5 of [Si6], and the relevant results for this problem are summarized in Theorem 3.2 below. 

We note that we have adopted in this section all of the notation of the previous sections. 

Theorem 3.1: Let the matrix A and the vector sequence u m ,m = 0,1,2,..., be as described in the 
preceding section. Let the positive integers t and k be such that 

t t 

| A, | > |A t+1 | and k = + 1) = J>j. (3.1) 

;=i i= i 

Determine the coefficients c^' k \j = 0,1,..., A:, for one of the procedures SMPE, SMMPE, and 
STEA, by utilizing u n , u n+1 , ..., as described in (1.4) and (1.5). Then, under the additional condi- 
tions given in (2.8) and (2.9), 

k t 

0„,*(A) = £ M = J I(A - A jY> + 0(c(n)) as n -+ oo, (3.2) 

;=o 

where 

,(„) = „« ^i±l ", (3.3) 
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a being some nonnegative integer. In fact, if the X j whose moduli are |A ( | are simple, then a = p, 
where Jf = max{pj : |Aj| = |A <+ i|}. Consequently, the polynomial ^^(X), for n — ► oo, has u>j zeros 
Xji(n), 1 < / < u>j, that tend to A j = 1,2 ,...,t. For each j and l we have 



Xji(n ) - A j = 0 ^ J (n) 1 / < ‘ ,j j as n — » oo, 

(3.4) 

where 

«>w = »” 

(3.5) 

Let us denote 

Aj(n) = — ^ Aj/(n) or A j(n) = — 5"! Xji(n)~ l 

l = l w i <= i J 

(3.6) 

Then 

Aj(n) — A j = 0(6j(n )) as n — ► oo. 

(3.7) 

Also, the pjth derivative of Q n ,k( A) has exactly one zero X j(n) that tends to Xj and satisfies 



A j(n) — Xj = 0(^j(n)) as n — ► oo. 

(3.8) 


Let the matrix A be normal, i.e., AA* = A* A. Then pj = 0 hence Uj = 1 for all j. If the 
^ ore determined through the procedures SMPE and STEA with the standard Euclidean inner 
product, and k is such that 

I Afe| > |Afc +1 |, (3.9) 

and provided q = u n for STEA, then (6.8) and (6.10) are substantially improved to read, respec- 
tively, 

On,k{X) = JJ(A - Aj) + O f I \ as n -4 00, (3.10) 

j= l \ Xk \ / 

and, for j = 1, k, 

A j(n) - Xj = O ^ | ^ as n -* oo, (3.11) 

where X j(n) is the unique zero of § „ ifc (A) that tends to Xj. 

We would like to note again that the result in (3.2) and (3.3) was originally given in [SiB, 
Section 6, Theorem 6.1], and those in (3.10) and (3.11) were originally given for SMPE in [Si3], 
The rest of Theorem 6.1 is new. As mentioned in these papers, the methods contained in Theorem 
3.1 are true extensions of the classical power method. 
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One important aspect of Theorem 3.1 is the construction of optimal approximations to defective 
eigenvalues through (3.6) and (3.7). From (3.4) it is clear that when pj = 0 hence u>j = 1, which 
occure automatically if Xj is a nondefective eigenvalue, the rate of convergence of the approxima- 
tion corresponding to Xj is optimal. In case Xj is a defective eigenvalue and pj > 0, the rate of 
convergence of each of its Uj corresponding approximations is 1/c Jj of the optimal rate. For this 
case (3.6) and (3.7) show how the poor approximations Xji(n ) can be combined in a simple way 
to give an optimal approximation, namely Aj(n). Similarly, (3.8) shows that X ,(n), the zero of 
the pjth derivative of $ n) k(A) that tends to Xj, has the same optimal convergence rate as A j(n). 
The results in (3.10) and (3.11) show that the approximations obtained from SMPE and STEA for 
a normal matrix converge twice as fast as those obtained for a nonnormal diagonalizable matrix 
having the same spectrum. 

3.2 Treatment of Invariant Subspace Approximations 

For the treatment of the eigenvectors and invariant subspaces we need some preliminary work. 


Let us rewrite (2.6) in the form 


M Pj 

*■(*) = EE 




+ G(z) , 


(3.12) 


where 

Zj = Xj 1 and dj, = (—Zj) t+1 ajj for ally,*. (3.13) 

Thus the da are the coefficients of the principal part of the Laurent expansion of F(z ) about the 
pole Zj, j = 1, .. M M. 


Consider the rational function 


p(-\ _ F(z) — F n + U (z) 

which is analytic at z = 0 and has the Maclaurin series expansion 


By (3.12) we can write 


“ E u n+v+i + !**• 
*=0 




(3.14) 


(3.15) 


(3.16) 
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where 


-n - v - 1 

l-i 


(3.17) 






&nd (jj(z^ is analytic at zj , i.e. , as above, the dji are coefficients of the principal part of the Laurent 
expansion of F{z ) about the pole zj,j = 1 ,...,M. Unlike before, both F{z) and the dji depend on 
n, in addition. The vector dj Pj , being a scalar multiple of the constant vector dj Pj , is an eigenvector 
of A corresponding to the eigenvalue A j. For i ^ pj , the vector dj being a linear combination 
of the constant vectors dji,i < l < pj , is in the invariant subspace Yi, and, as is obvious from 


(3.17), the coefficients of the dji in this linear combination are polynomials in n, up to the common 
multiplicative factor z~ n ~ l/ ~ l . 


Following now the developments in Section 5 of [Si6], we obtain the following constructive result 
for the dji. 


Theorem 3.2: With the notation and conditions of Theorem 3.1, let us define, for 1 < j < t, 

0(«) = 1 A,(n) or Cj(n) = 1/Aj(n), (3.18) 


and, for 0 < i < pj and 1 < / < Uj, 


. 'Y 1 ' r i n ’ k ) ,k-r V r _m-l 

d - ,(n\ — (z - C (nW * Cr 2 2^m=l u n+u+mZ ( 

‘ C,<)) 


and 


dji( n ) — r djjj(n). 

/= 1 

Then , for 0 < % < pj, dji(n) is an approximation to dji in (3.17) in the sense 


(3.19) 


(3.20) 


lim sup 


d : ,(n) - d 


il/n 






(3.21) 


We would like to note that Theorem 3.2 actually contains the basic ingredients of a potentially 
bona fide numerical method for approximating the eigenvectors and other vectors in invariant 
subspaces corresponding to largest eigenvalues of A. The resulting method, which is described 
below, (i) makes use of only u n ,u„+i, ..., disregarding «o»ui,...,u n _i entirely, and (ii) enables us 
to construct optimal approximations to the vectors a ;i , 0 < t < pj, for pj = 0 as well as pj > 0. We 
now turn to these constructions. 
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3.2.1 Approximation of the Eigenvector a JPj 

Let us first specialize the result of Theorem 3.2 to the case i = pj. We have 


J . — \#+i41j. 

a jpj ~ A j a jpji 


so that (3.21) can also be written as 


limsup \j n v 1 dj Pj (n ) 


*3Pj 


**5 

^(+1 




(3.22) 


(3.23) 


This clearly shows that the vector dj Pj (n), as n — ► oo, aligns itself with the constant vector dj Pj , 
which is proportional to the eigenvector a JPj , practically at the rate of |A t+1 /Aj| n . It is thus 
sufficient to compute the vectors djij(n), 1 < / < Wj, by (3.19), and then to form dji(n) by (3.20) 
as our approximation to the (appropriately normalized) eigenvector a JPj , and this is valid whether 
Pj = 0 or pj > 0. 


3.2.2 Approximation of the Vectors aji, 0 < i < pj - 1 

Although the vector aj Pj (up to a multiplicative constant) can be determined from dj Pj (n) in a 
rather painless manner, the determination of the remaining Oj,- from the dji(n ) becomes somewhat 
involved. The reason for this is that the vectors dji, apart from the scalar multiplicative factor 
zj n ~ v ~ x , are linear combinations of the dji hence of the aji, t < l < pj , with coefficients that vary 
as functions of n, as can be seen from (3.17) and (3.13), and as has been mentioned before. This 
means that the dji do not have a fixed direction with varying n. 


Let us now rewrite (3.17) in the form 


T(n) 


where T{n) is the upper triangular matrix 


i 

O 


1 

o 

dji 


dji 

\ 

3 

• 

t 

t 


f 

V*. 


(3.24) 


n»)= 


Too Toi 
T"ll 


T Opj 
T lpj 

T P)P] J 


Til - 


—n - v — 1 

l-i 


z ■ l+t all t and /. 


(3.25) 
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Obviously, T(n ) is invertible since its diagonal elements are unity. Thus, 


1 

o 


■ — 
o 

i 



... ^ 

II 

►-» 

4 

z j > 

(3.26) 

L m 


* 

v?-> 

0 s 

1 




where T(n ) 1 is also upper triangular, its diagonal elements being unity. 


Now since all elements of T(n) are polynomials in n, and since its determinant is unity, the 
elements of T’(n) -1 turn out to be polynomials in n, i.e., the matrix T(n) _1 can grow at most 
polynomially as n — *• oo. If we denote the nonzero elements of T(n) -1 by pu, i < l < pj, 0 < t < pj, 
then we can write (3.26) in the form 


pj 


dji = zj +l,+1 dji, 0 < i < pj. 

1st 


(3.27) 


Let us replace dji in (3.27) by | (djt — dyj(n)) + dji(n ) j , and invoke (3.21). After some manip- 
ulation we obtain 

k - *" + " +1 X > 4 (»)| 


limsup 

n— *• oo 


/=! 


1 /rt 

< 

•^<+1 


A ; 


(3.28) 


This implies that the vector Pu djl( n ) aligns itself with the fixed vector dji as n — ► oo prac- 
tically at the rate of |A<+i/Aj| n . We leave the details of the proof of (3.28) to the reader. 


We note that (3.28) shows how to construct a good approximation to dji from the dji(n) and 
A j, provided A j is known. Since A j is not known, however, the vector Y^LiPudji(n) cannot be con- 
structed. We, therefore, propose to replace A j in the matrix T’(n) -1 by the known approximations 
Cj(»). Also, in this case, it can be shown that (3.28) remains valid. Again, we leave the details of 
the proof to the reader. 


Before closing this section, we must mention that the developments of this section are meant to 
be theoretical, in general. Although they can be used for computational purposes for small values of 
k t their use for large k is likely to introduce numerical instabilities in many cases. These instabilities 
are mainly a result of our direct use of the power iterations u n+l - = A'u n , i = 0, 1, ... . They exhibit 
themselves first of all through the poor computed approximations to the Ay, which ultimately affect 
the computed eigenvector approximations. This problem can be remedied by observing that the 
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approximations F ni k(z) that we developed and applied to the matrix eigenvalue problem are very 
tightly connected with Krylov subspace methods for some of which there exist computationally 
stable implementations. In particular, the SMPE and STEA procedures are related to the method 
of Arnoldi and the method of Lanczos, respectively, as we show in detail in the next two sections. 


4 General Projection Methods and the Methods of Arnoldi and 
Lanczos for the Matrix Eigenproblem 

4.1 General Projection Methods 

Let {t>i, ...,Vfc} and {tuj, ...,u;*} be two linearly independent sets of vectors in C N , and define 
the N x k matrices V and W by 

V = [vi|v 2 | •••!«*] and W = [u>i|t/> 2 | • • -|w*]. (4.1) 

In addition, let us agree to denote the subspaces span {t>i, ..., t>*} and span {uq, ..., Wk} by V and 
W, respectively. 

In projection methods one looks for an approximate eigenvalue-eigenvector pair (A, x ) with 
x 6 V that satisfies the condition 

(y, Ax — Ax) = 0 for all y £ W, (4.2) 

which can also be written in the equivalent form 

W*(A — A I)V£ — 0 for some £ € C k . (4-3) 

Here we have used the fact that x € V implies that x = for some ( e C k . Of course, (4.3) holds 
if and only if A is an eigenvalue of the matrix pencil ( W* AV , WV), i.e., it satisfies the characteristic 
equation 

det(WMF - A W*V) = 0. (4.4) 

In general, (4.4) has k solutions for A, which are known as Ritz values in the literature. Given that 
A' is a Ritz value, the corresponding eigenvector is a solution of the homogeneous system in (4.3). 
The eigenvector approximation corresponding to A' is now x' = V£', and is known as a Ritz vector. 

The different projection methods are characterized by the subspaces V and W that they employ. 
(Note that V and W are also called, respectively, the right and left subspaces.) 
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4.2 The Method of Arnoldi 


In this method V and W are Krylov subspaces given by 

V = n_j = span {u 0 , Au 0 , ..., J 4 fc_1 u 0 } and W = W k -\ = V*_i, (4.5) 

for some arbitrary vector uq. 

Arnoldi has given a very successful implementation of this method. In this implementation the 
vectors A’uoi* = 0, 1,..., are orthogonalized by a very special Gram-Schmidt process as follows: 

Step 0. Let uj = uo/||uo|| 

Step 1. For j = 1, ..., k — 1, do 

Determine the scalar hj + > 0 and the vector Vj +1 , such that ||ty + i|| = 1 and 
hj+i,j v j+i = Avj - hijVi, hij = fa, Avj), 1 < * < j. 


(4.6) 

Thus the N x k matrix V = [vi Jual • • -|u*] is unitary in the sense that V*V is the k x k identity 
matrix. As a result, W*V = V*V = /, and the generalized eigenvalue problem of (4.3) now becomes 


(4.7) 

(4.8) 

i.e., the Ritz values are the eigenvalues of H . 

4.3 The Method of Lanczos 

In this method V and W are the Krylov subspaces 


= Af , 


where H is the k x k upper Hessenberg matrix 

/in h\2 
/l 2 l /l 2 2 
H = | /l32 


h ik 
h 2k 
h$k 

hk,k-i hkk 


V = V k -\= span {uq,Au 0 , ...,A k ^o} and W = W k -\ = span {q, A m q, ...,(A*) k 1 q}, (4.9) 
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for some arbitrary vectors «o and q. 

The algorithm given by Lanczos generates one set of vectors {vi, from the A'uo, i = 

0, 1 — 1, and another set of vectors {t/^, ...,«>*} from the (A*)’g, » = 0, l,...,Jfc — 1, that satisfy 

the biorthogonality condition 

<Vj = Sij, (4.10) 

as long a6 the process does not break down. This is achieved by the following Algorithm: 


Step 0. Set vi = cruo and Wi = rq such that (u>i,i>i) = 1. 
Step 1. For j = 1, ..., k — 1, do 

(a) Compute i ) ;+1 and wj + i by 
Vj + 1 = Avj - ajVj - favj.i 
Wj + i = A*wj - ajWj - 6jWj _ i 
(when j = 1 take /3iv 0 = 6\w 0 = 0) 
with aj = (wj,Avj) 

(b) Choose 6j+i and (3j + 1 such that 

fa+i(3j+i = t>y + i) 

and set 

v i+i = Vj+i/S j+ 1 and w j+l = Wj + if(3 j+l . 


(4.11) 


By (4.10) the matrices V and W satisfy W*V = /. As a result, the generalized eigenvalue problem 
of (4.3) becomes 

(4.12) 


where H is the k x k tridiagonal matrix 

a l /?2 
^2 02 /?3 

fa Ot 4 (34 

H — 

/3k 

fa Olk 


(4.13) 


and the Ritz values are the eigenvalues of H. 
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4.4 The Case of Hermitian A 


The subspaces V in (4.5) and (4.9) are identical. When A is hermitian, i.e., A* = A, and 

q = u 0 , the subspaces W in (4.5) and (4.9) become identical too. Thus the methods of Arnoldi 

and Lanczos become equivalent for the case under consideration. Furthermore, it can be shown 
that the elements h{j of the matrix H in the method of Arnoldi satisfy 7iiJ+l = hi+ij so that 
= h»+i,i > 0 for i = 1, 2, ..., k — 1, while h,j = 0 for j > i + 2. The diagonal elements ha are 

all real. That is to say, in the absence of roundoff, the matrix H is real symmetric tridiagonal. If 

we pick q = and choose Sj = f3j = yj ( i'j , i’j ) in the method of Lanczos, then the matrix H in 
(4.13) turns out to be real symmetric and is exactly the same as the one produced by the method 
of Arnoldi. 

The properties of the Ritz values and Ritz vectors of the Lanczos method, as applied to hermitian 
matrices, have been analyzed by Kaniel [K], Paige[Pai], and Saad [Sal], The paper [Sa2] gives results 
for nonhermitian matrices. 


5 Equivalence of Rational Approximation Procedures and Krylov 
Subspace Methods 

We now go back to the rational approximation procedures SMPE, SMMPE, and STEA. In 
particular, we concentrate on the poles and residues of the rational functions F Ui k{z). 


5.1 Poles of F nt k{z) vs. Ritz Values 


From the determinant representations of F n ^(z) that are given in Theorem 2.2 of [Si6], it follows 
that the denominator Q n ,k(z) of F nt k[z) is a constant multiple of the determinant 


D(X) = 


1 

A 

A* 

u 00 

«01 

- • • u ok 

«10 

u n 

... m k 

Ufc-1,0 

Ufc- 1,1 

. . . 


(5.1) 


where A = z 1 and u,j are as defined in (1.5). This implies that the zeros of the polynomial £)(A) 
are the reciprocals of the zeros of Q n ,k{z), or, equivalently, the reciprocals of the poles of F n ^{z). 
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In addition, they are the roots of a generalized eigenvalue problem as we show next. 


Theorem 5.1: Whatever the u,j, the zeros of the polynomial D( A) in (5.1) are the eigenvalues of 
the matrix pencil ( X , T), where 


«01 

«02 

«OJt 


«00 

«01 

■ • • Wo,fc-l 

«11 

«12 

«1* 

and T- 

«10 

«n 


1 

c 

sr 

1 

h* 

«fc-l,2 • • • 

«*-l ,k . 


. Wfc-1,0 WJt-1,1 

u k-l,k-l 


i.e., they satisfy the equation 

det(X - AT) = 0. 


(5.2) 


(5.3) 


Proof: Multiply the (J — l)st column of 23(A) by A and subtract from the jth column for j = 
k + in this order. This results in 



1 

0--0 



Woo 



D( A) = 

WlO 

X- XT 

= det{X - XT), 


WJt-1,0 




thus proving the claim. □ 


(5.4) 


When Uij are as in (1.5), Theorem 5.1 takes on the following interesting form. 

Theorem 5.2: Define the N x k matrices V and W by 

V = [Wn|w n +l| ' * " |Wn+fc— l] (5-5) 

and 

W = V for SMPE, 

W = [ qi \q 2 \--- | ft ] for SMMPE, (5.6) 

W = [ q \A-q\'..\(A*)*-' q ) forSTEA. 

Then, with u,j as defined by (1.5), the zeros of D{ A) are the eigenvalues of the matrix pencil 

(W*AV, W*V), i.e., they satisfy 


det(WMP - AW*F) = 0. 


(5.7) 
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Consequently, the reciprocals of the poles of the rational approximations F ni k(z ) obtained from the 
SMPE or SMMPE or STEA procedures are the Ritz values of the Krylov subspace methods whose 
right and left subspaces are column spaces of V and W, respectively. 

Proof: Since Theorem 5.1 applies, all we need to show is that X = W*AV and T = W m V 
there. That T = W*V Mows from (1.5), (5.2), (5.5), and (5.6). FYom ( 1 . 5 ), ( 5 . 2 ), and ( 5 . 6 ), we 
similarly have X = PT*[u n+ i| • • •|u n+ j t ]. Using now the fact that « J+1 = Auj, j > 0, we also have 
[ u n+i|*** |un+fc] = AV. Consequently, X = W*AV. Again, from u J+ i = Auj,j > 0, we realize, 
in addition, that the right subspace for all three methods is none other than the Krylov subspace 
span {«„, Au n , ...,i4* _1 u n }. This completes the proof, n 


5.2 Residues of F n ,k(z ) vs. Ritz Vectors 


Turning Theorem 5.2 around, what we have is that the Ritz values obtained by applying 
the Krylov subspace methods whose left and right subspaces are column spaces of V and W, 
respectively, are, in fact, the reciprocals of the poles of the corresponding rational approximations 
En,k(z) to the meromorphic function F(z) = £^ 0 u; 2 \ An immediate question that arises is, of 
course, whether there is any connection between the Ritz vectors and the F ni k(z). The answer, 
which is in the affirmative, is provided in Theorem 5.3 below. 

Theorem 5.3: Let X be a Ritz value of the Krylov subspace methods whose right and left subspaces 
are column spaces of, respectively, V and W in Theorem 5.2. Denote the corresponding Ritz vector 
by x. Let v = -1 in the corresponding rational approximation F„ t k(z), cf. (1.2). Provided X is 
simple, x is a constant multiple of the residue of F n , k (z) at the pole £ = 1 /A. 


Proof: Let us first determine the residue of F n< k(z) at the pole z = 1/A. With v = — 1 

Res F k(z) I { = Idhldi} - SLofll! r ^n+r-i(-£) 

)U=f Q'n, k (*) ~ Q' n ,kW 


(5.8) 


since / 0 that follows from the assumption that X is simple, which implies that 2 is a 

simple pole. By F n + t (z ) = F n -i(z ) + u m z m and £jL 0 c r £*~ r = 0 , we can rewrite ( 5 . 8 ) in 

the form, cf. Section 5 of [Si 6 ], 


n+r-1 


gn+k—l *-l 


RGB F ni k(z)\ z= { — r U m Z m — -pr-, 77 r T) m U n + m , (5.9) 

yn,kV Z ) r =\ m=n 'Ht„,k \ Z ) m=0 


where 


T)m = Cr ^ m ™ = 0,1,...,*- 1. 


(5.10) 
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is a scalar 


Let us now denote rj = (ifa, »?i, Then (5-9) implies that Res F n ,k(z)|* = f 

multiple of Vi). Recall that the Ritz vector corresponding to X is V£, where ( 6 C k and satisfies 
W*(A — A/)V£ = 0, which, on account of Theorem 5.2, is the same as ( X — \T)£ — 0. Thus in 
order to show that ResF ni k(z)|* = ^ is a constant multiple of the Ritz vector corresponding to the 
Ritz value A, it is sufficient to show that 


(. X - XT)r/ = 0. 


From (5.2), the (i + l)st component of the ^-dimensional vector r = (X - XT)i), i = 0, 1, 
is 


Ti 


k-\ 

'X 1 (^i,m+l Au,‘ m )jJ m , 


m=0 


which, by (5.10), becomes 


( 5 . 11 ) 

( 5 . 12 ) 


k - 1 k 

n=EKm+ 1 -Km) E c rA r-m-1 . 

m=0 r=m+l 

Expanding and rearranging this summation, we obtain 



(5.13) 


( 5 . 14 ) 


Recalling that £r=o c r A r = 0, we can rewrite (5.14) as 

k 

Ti — ^ ( 5 . 15 ) 

m=0 

Finally, from the assumption that c* = 1 and from the fact that co,cx,...,cjt_i satisfy the linear 
equations in (1.4), we conclude that 


r, = 0, i = 0, 1 ,. ,.,k— 1 . 


( 5 . 16 ) 


This completes the proof. □ 

5.3 Summary of F n ^{z) vs. Krylov Subspace Methods 

We now combine the results of Theorems 5.2 and 5.3 to state the following equivalence theorem, 
which forms the main result of this section, and one of the main results of this work. 

Theorem 5.4: Let F n> k(z ) be the rational approximation obtained by applying the SMPE or 
SMMPE or STEA procedure to the vector-valued power series £m=o u mZ m , where u m = A m u 0 , m = 
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0,1,..., are power iterations. Denote the reciprocals of the poles of F n ,k(z) by A'j, ..., A'*.. Setting 
v — “ 1 ,n the numerator of F n> k(z), denote the corresponding residues of F n< k(z) by x\, ...,x' k . 
Next, denote by X'j, ... , X' k and xj , x' k , respectively, the Ritz values and corresponding Ritz vectors 
produced by the Krylov subspace methods whose right subspace is span and 

left subspaces are the column spaces of the matrices W in (5.6). Then 

A ; = A ?’ j = (5.17) 

and 

x'j oc x", provided A' is simple. (5.18) 

More can be said about the SMPE and STEA procedures versus the methods of Arnoldi and 
Lanczos, and this is done in Corollary 5.5 below. 

Corollary 5.5: With F n>k (z),X'j,x'j, j = 1 as in Theorem 5.4, let X'j,x'j,j = 1 be the 
Ritz values and Ritz vectors produced by applying the k-step Arnoldi or Lanczos methods to the 
matrix A, starting with the vector u n = A n UQ. (That is to say, replace the initial vector uq in 
Step 0 of ( 4 . 6 ) or (4-U) by the nth power iteration u n .) In addition, let q be the same vector for 
the STEA procedure and the Lanczos method. Then the SMPE and STEA procedures are equiva- 
lent to the methods of Arnoldi and Lanczos, respectively, precisely in the sense of (5.17) and (5.18). 

Now that we have shown the equivalence of the methods of Arnoldi and Lanczos with the 
generalized power methods based on the SMPE and STEA approximation procedures, we realize 
that those results that we proved in Section 3 for the latter and that pertain to the nondefective 
as well as defective eigenvalues of A are, in fact, new results for the former. That is to say, if we 
apply the methods of Arnoldi or Lanczos to the matrix A starting with the nth power iteration 
u « = A n uo for large n, then the Ritz values are approximations to the k largest distinct eigenvalues 
of A counted according to the multiplicities that appear in (2.2). Similarly, the Ritz vectors can be 
used for constructing the approximations to the corresponding invariant subspaces. These points 
will be considered in greater detail in the next section. 

5.4 Optimality Properties of the Arnoldi Method 

In Section 1 we mentioned that the coefficients of c[ n ’^ of the denominator polynomial Q n<k (z) 
of F n ,k(z) for the SMPE procedure are the solution to the optimization problem given in (1.6). If 
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we now pick the vectors u m as the power iterations « m = A m uo, m = 0, 1, then (1.6) reads 


min 

C0.Cl,...,C*_i 


f k-1 

£c,A' + A*)u n 

d=o 


(5.19) 


Exploiting the fact that the method of Arnoldi is equivalent to the generalized power method based 
on the SMPE approximation procedure, we can state the following optimality properties for the 
Arnoldi method as applied to a general matrix A. 


Theorem 5.5: Let A'-, a;', j = 1,2, ..., k, be the Ritz values and appropriately normalized Ritz 
vectors, respectively, produced by applying the k-step Arnoldi method to the matrix A starting with 
the power iteration u„ = A n Uo. Let Vk denote the set of monic polynomials of degree exactly k, 
while Xfc denotes the set of polynomials of degree at most k. Then for k < k 0 , cf. (2-4), 

lir* 1 M 


no* - a;/) 



mui ||/(A)ti„|| = c„,j, 

/€Ffc 


(5.20) 



n m -Ki) 

t=i 

w 


u 


TM 


(A - X ;/)x' = (£ c^»A‘ + X‘) „„ = £ cS n, ‘ ) 

\t=0 / t=0 


^n+i "h k > 


(5.21) 


(5.22) 


||(A-A'/)x;|| = min ||(A - A/) 5 (A)« n ||, 

A€C, s eP*-i 

= mm ||(A- A/)x'||, 

= ll(^-^/)5(A)u n ||, 

= e n> k independently of j , (5.23) 

and 

((A - A '•/)*', g(A)u n ) = 0, all g 6 **_i. (5.24) 

For k = ko, we have Ax'j = AJx' . 


Proof: We start by noting that (5.24) is nothing but a restatement of the requirement that 
Ax'j - A jij be orthogonal to the left subspace of the Arnoldi method, which is also its right sub- 
space V = {s(A)u n : g e x*_!}. 
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Since the Ritz values A', j = 1, k, are the zeros of the monk polynomial fc (A) = 
Z)?=o cj n ’^A* + A*, we can write 

On,k{ A) = JJ(A — AJ). (5.25) 

»=1 

Thus 

k- i k 

On ,k( A ) = Y c i + A k = JJ(^4 - A;/). (5.26) 

1=0 i=l 

Combining (5.26) with (5.19), we obtain (5.20). 

Provided a;'- is as given by (5.21), the proofs of (5.22) and (5.23) are immediate. 

To prove the validity of (5.21) it is sufficient to show that x'j 6 V and that (A - A '•/)*'• is 
orthogonal to all the vectors in V. That a:' G V is obvious from (5.21) itself. The fact that 
c!"* fc) , * = 0,1,..., A; - 1, are the solution of the optimization problem in (5.19) implies that the 
vector 0 n ,k(A)u n is orthogonal to every vector in V. But <$„ ifc (A)u n = (A - A' J)x', as can be seen 
from (5.26). This completes the proof. □ 

Note that the proofs of (5.20) and (5.21) for hermitian matrices can also be found in [Par2, 
Chap. 12, pp. 239-240]. 

A few historical notes on the methods of Arnoldi and Lanczos are now in order. 

Following the work of Arnoldi the equivalent form in (5.19) was suggested in a paper by Erdelyi 
[E], in the book by Wilkinson [W, pp. 583-584], and in the papers by Manteuffel [M] and Sidi and 
Bridger [SiBr]. The equivalence of the different approaches does not seem to have been noticed, 
however. For instance, [W] discusses both approaches without any attempt to explore the connec- 
tion between them. With the exception of [SiBr], these works all consider the case n = 0. The case 
n > 0 and the limit as n — ♦ oo are considered in [SiBr] and [Si3]. 

In his discussion of the power iterations in [H, Chap. 7], Householder gives determinantal 
representations of certain polynomials whose zeros are approximations to the largest eigenvalues 
of the matrix being considered. One of these representations, namely the one given in Eq. (16) 
in [H, p. 186], coincides with the determinant D{ A) in (5.1) of the present work pertaining to the 


26 



STEA approximation procedure with n > 0. It is shown there that the zeros of D( A) tend to the 
k largest eigenvalues of the matrix A as n — ► oo, but a theorem as detailed as our Theorem 3.1 is 
not given. It is also mentioned in the same place that, apart from a constant multiplicative factor, 
the polynomials D{ A) with n = 0 are precisely the so-called Lanczos polynomials given in Eq. (10) 
of [H, p. 23] that are simply det(A / - H ) with H as given in (4.13). As we pointed out in this 
section, up to a constant multiplicative factor, D( A) with n > 0 is itself the Lanczos polynomial 
det(A / — H ) when the Lanczos method is being applied with uq replaced by u n = A n u 0 . It is not 
clear to the author whether this connection between D{ A) with n > 0 and the Lanczos method has 
been observed before or not. 


6 Stable Numerical Implementations 

In this section we concentrate on the implementation of the generalized power methods based on 
the SMPE and the STEA approximation procedures as these are related to the methods of Arnoldi 
and Lanczos respectively, and as good implementations for the latter are known. For example, the 
implementations in (4.6) and (4.11) are usually quite stable. 

6.1 General Computational Considerations 

The theoretical results of Section 3 all involve the limiting procedure n — *■ oo. When |Ai| 
is larger (smaller) than 1, we may have difficulties in implementing the procedures above due to 
possible overflow (underflow) in the computation of the vectors u m for large m. This situation can 
be remedied easily as will be shown below. 

We first observe that the denominator polynomial Q n ,k(z) of the vector- valued rational approx- 
imation F nt k(z) remains unchanged when the vectors u n ,u n+ i,u n+2 , ..., are all multiplied by the 
same scalar, say a, and so do its zeros. Consequently, the vectors dji(n) defined in Theorem 3.2 
remain the same up to the multiplicative factor a. That is to say, as far as the matrix eigenvalue 
problem is concerned, multiplication of the vectors u„ , u n+ i, ..., by the scalar a leaves the eigenvalue 
approximations unchanged and multiplies the eigenvector approximations by a. 

For the purpose of numerical implementation we propose to pick a = l/||u n ||, and we achieve 
this by the following simple algorithm that is also used in the classical power method: 
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Step 0. Pick «o arbitrarily such that ||uo|| = 1. 

Step 1. For m = l,2,...,n do 
w m = Au m _ x 

«m = «Wll w m||* 

( 6 . 1 ) 

Once the vector u n has been determined in this way, we apply the Jfc-step Arnoldi or Lanczos 
methods to the matrix A with this u n as the starting vector, and obtain the k Ritz values and the 
corresponding Ritz vectors. 

6.2 Treatment of Nondefective Eigenvalues 

If A y, one of the largest t distinct nonzero eigenvalues of A that contribute to the power iterations 
u m exactly as in (2.2), is nondefective, i.e., it has wy = 1, then, under the conditions of Theorem 
3.1, there is precisely one Ritz value Ay(n) that tends to Ay with Ay(n) - Ay = 0(nP|A t+1 /Ay| n ) as 
n oo if A is nonnormal and Ay(n) - Ay = 0(|A* + i/Ay| 2n ) as n -* oo if A is normal. If xy is 
the eigenvector corresponding to Ay, then the Ritz vector xy(n) corresponding to Ay(n) tends to xy 
with limsup n _ too ||xy(n) — Xy||» < |At+i/Ay| in all cases, by Theorem 3.2. Thus the Ritz value and 
the corresponding Ritz vector are the required approximations to the eigenpair (Ay,xy). 

6.3 Treatment of Defective Eigenvalues 

When the eigenvalue Ay is defective and has wy > 1 in (2.2), then, under the conditions of 
Theorem 3.1, there are precisely wy Ritz values A y/(n), 1 < / < wy, that tend to Ay, each with the 
rate of convergence 0([n*’|A 1+1 /Ay|] n / a '>) as n -» oo. That is to say, the Ritz values for a defective 
eigenvalue are not as effective as the ones for nondefective eigenvalues. However, Ay(n) and Ay(n) 
that are defined in Theorem 3.1 do enjoy the property that they tend to Ay with the optimal rate 
of convergence 0(nP |A*+i/Ay|") as n — ► oo, as in the case of a nondefective eigenvalue. 

As for the invariant subspaces Yi, i = 0, l,...,py, py = wy - 1, the most basic result to use is 
Theorem 3.2. Acording to this theorem and the subsequent developments, the building blocks for 
the invariant subspaces are the vectors dy,-,j( n ) that are defined by (3.19). Now the vector dy,,/(n) 
is a constant multiple of Res F n> k(z ) where 2 ji(n) = l/Ay/(n), which, when v = -1, is a 
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constant multiple of the Ritz vector corresponding to Aj/(n) by Theorem 5.4. That is, once the 
Ritz vectors have been computed, they can be used to construct the vectors d Jt) i(n), which, in turn, 
are used in constructing the approximate invariant subspaces Y| with optimal accuracy. 

Let us now show how the vector dji ,l( n ) is expressed in terms of the corresponding Ritz vector. 
For simplicity of notation we shall write z = Zji(n) - l/A_,-,(n). The Ritz vector corresponding to 
A ji{n) is A = where tq = u n and (u n , u n ) = 1 by (6.1). We recall that for the method 

of Arnoldi the vectors t>i , vj, ..., t>k are actually the ones that would be obtained by orthogonalizing 
the power iterations u n , Au n , ..., A k ~ 1 u n by the Gram-Schmidt process. For the method of Lanczos 
the vectors Vi,«2>— are obtained by biorthogonalizing u n , Au„, ..., A k ~ l u n against the vectors 
q, A*q, ...,(A*) k ~ 1 q. In both cases we have 

AV = VH + R, (6.2) 

where H is the upper Hessenberg matrix of (4.8) for the Arnoldi method or the tridiagonal matrix 
of (4.13) for the Lanczos method, and thus it is upper Hessenberg in both cases. The matrix R has 
all of its first k — 1 columns equal to zero, and its A: t li column is ,k v k+\ ■ 

From the way the vectors Vi, v ?, ..., v* are constructed it is easy to see that 

V = [u n |Au n | •••|A fc_1 u n ] B, (6.3) 

where B is the upper triangular matrix 

P\\ Pu ' • • Pik 

022 ••• 02k 

Pkk 

whose entries fa are required. Substituting (6.3) in (6.2), we have 

[Au n \ A 2 u n | • • • | A k u n ]B = [« n | Au n \ ■ ■ ■ | A k ~'u n ]BH + R. (6.5) 

By equating the jth columns of both sides of (6.5) for j < Ar, we obtain 

n)fa = Y,( Aiu n)(BH) i+1 J ( 6 . 6 ) 

i=l 1=0 

as the matrices B and BH are upper triangular and upper Hessenberg, respectively. From the 
linear independence of the vectors A'u n ,i = 0, 1, ..., k- 1, (6.6) reduces to 

Pi} = (BH)i+i,j} 0<i< j ; Poj = 0 all j > 1. (6.7) 
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Now fa = 1 since t>i = These equations can be solved in the order i = 0, 1, ..., j, j = 
1,2,... ,k - 1, which amounts to computing the 1st, 2nd,...,fcth columns of the matrix B, in this 
order. This can be accomplished as hj +1 j > 0 for all j. Thus by letting t = 0 in (6.7), we obtain 
YZtiPirhr j = 0, which we solve for fa+i. Next, letting i = 1, we obtain fa = fah rj , 
which we solve for faj+ j. By letting i = 2,3, ...,j, we obtain A+u+i, t = 2,3, ...,j, in this order. 


Suppose that the Ritz vector x has been computed in the form £ = £* =1 and that the 

have been saved. Then, recalling also that u n+l - = A'u n , i = 0, l,...,Jfc - 1, 


and the coefficient of tt n is given by 


fc-i 

x = a;u„ +i , 

t '=0 


-'52 faii- 

3-1 


Similarly, from (3.19), the coefficient of u n in dj^i(n) (setting v — —1 there) is given by 


ct k) z k 


■ - (i - c ' w)i &■ 


( 6 . 8 ) 


(6.9) 


(6.10) 


Now if we denote the Ritz values by Ai, ..., X' k and set z[ = 1/AJ-, t = 1, ..., k, then we can show that 


= “(* “ 0(«))‘- 


n (i -z' T /i) 

r—\ 


( 6 . 11 ) 


so that 


4 v (») = = ( ~ - C,(n)) ' 

(To 


k EUPuti 

n (i -z'jz) 

r=l 

*'r** 


( 6 . 12 ) 


which is the desired result. 


With this we can now go on to compute the approximations to the eigenvector aj P} and the 
vectors aji, 0 < i < Pj, precisely as described in Sections 3.2.1 and 3.2.2, respectively. For example, 
the vector dj Pi ( n ) = J2i=i ^ip 3 A n ) * s the approximation to the eigenvector a ]f>j the error in which 
is, roughly speaking, 0(|At + i/Aj|") as n — ► oo. 
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